Unravelling Selection Shifts Among Foot-and-Mouth Disease Virus (FMDV) Serotypes

FMDV virus has been increasingly recognised as the most economically severe animal virus with a remarkable degree of antigenic diversity. Using an integrative evolutionary and computational approach we have compelling evidence for heterogeneity in the selection forces shaping the evolution of the seven different FMDV serotypes. Our results show that positive Darwinian selection has governed the evolution of the major antigenic regions of serotypes A, Asia1, O, SAT1 and SAT2, but not C or SAT3. Co-evolution between sites from antigenic regions under positive selection pinpoints their functional communication to generate immune-escape mutants while maintaining their ability to recognise the host-cell receptors. Neural network and functional divergence analyses strongly point to selection shifts between the different serotypes. Our results suggest that, unlike African FMDV serotypes, serotypes with wide geographical distribution have accumulated compensatory mutations as a strategy to ameliorate the effect of slightly deleterious mutations fixed by genetic drift. This strategy may have provided the virus by a flexibility to generate immune-escape mutants and yet recognise host-cell receptors. African serotypes presented no evidence for compensatory mutations. Our results support heterogeneous selective constraints affecting the different serotypes. This points to the possible accelerated rates of evolution diverging serotypes sharing geographical locations as to ameliorate the competition for the host.


Introduction
Foot-and-Mouth Disease Virus (FMDV), a single-stranded RNA virus, belongs to the Picornaviridae family (Racaniello, 2001). It has a genome of 8.5 kb that is translated into a polyprotein and processed to yield the structural (P1) and non-structural (P2 and P3) proteins (Belsham, 1993). P1 produces the four different capsid proteins (VP1 to VP4). VP1 is the most surface-exposed capsid protein (Acharya et al. 1989) and contains the major antigenic determinants A, C and D (Mateu, 1995). Site D is also formed by residues from VP2 and VP3 and it involves residues of the C-terminus of VP1 (Lea et al. 1994). The antigenic diversity of VP1 makes it an ideal candidate for epidemiological studies (Bastos et al. 2000;Bastos et al. 2001;Sangare et al. 2001;Bastos et al. 2003;Knowles and Samuel, 2003;Sangare et al. 2003).
FMDV is classifi ed into seven immunologically diverse serotypes distributed throughout the world (South Africa Territories (SAT) 1-3, Asia-1, A, O and C). It has an enormous wide spectrum of infection refl ecting its genetic and antigenic diversity. O is the most predominant of the serotypes. Type A possesses greater geographical spread than any of the other serotypes. It is also the most antigenically diverse of the Eurasian serotypes with a large amount of variants in Asia, Africa and South America (Ansell et al. 1994;Konig et al. 2001;Araujo, Montassier and Pinto 2002;Tosh, Hemadri and Sanyal 2002). To date according to the OIE/FAO World Reference Laboratory for FMD, the last type C outbreak to emerge was in East Africa and Asia in 1996. The virus type Asia1 is endemic to southern Asia and is now considered the least diverse type with a single subtype (Ansell et al. 1994). South African territories FMDV types (SAT 1-3) have a natural reservoir in the African buffalo (Syncerus caffer) (Condy et al. 1985). They are also characterised by their large degree of antigenic diversity Sangare et al. 2003).
A number of evolutionary and population parameters are believed to be responsible for the emergence and dispersal of the different virus serotypes. These constraints can be broadly classed as genetic, for example the ability of the virus to avoid the host immune response, or ecological such as host mobility and population density. Elucidating the mechanisms utilised by viruses that subsequently drive the Evolutionary Bioinformatics Online 2006:2 211-225 evolution and infectivity would critically aid in our understanding of viral epidemiology and disease prevention. Recently, Carrillo et al. (Carrillo et al. 2005) have performed an exhaustive comparative genomic analysis to identify novel functional constraints acting on the genome. Conversely, we explored all the available sequences for a gene belonging to the different FMDV serotypes. Analysis of the constraints governing FMDV VP1 protein has been previously conducted to determine the main causes for the emergence of FMDV strain CS (serotype C) (Elena, Gonzalez-Candelas and Moya, 1992). This insightful work yield precise information about the evolutionary parameters governing FMDV serotypes evolution. However, heterogeneity in the evolutionary constraints has not been explored with much detail due to a limitation in the number of sequences and computational tools. Also, no detailed study has been performed before to exhaustively explore other selective constraints at the molecular level in FMDV using an unbiased sample of sequences.
It has been long recognised that the VP1 gene differs in about 30-50% between serotypes (Knowles and Samuel, 2003). We postulate that there are differences in the serotypes ability to infect and spread that may be accounted for by heterogeneity of evolutionary pressures. In this study we test for changes in the selection constraints to which amino acid sites are subjected by examining variations in the evolutionary rates of amino acid sites (selection shifts or site-rate changes between lineages) between the different FMDV serotypes. For this analysis we use the capsid protein VP1, which is essential in cell receptor recognition and escape from the host immune response . We ask the question of what selective constraints have very likely been responsible for the emergence and spread of the different FMDV serotypes. To test for heterogeneity in selective constraints among FMDV serotypes, we conducted several types of analyses. We fi rst determined whether different selective constraints have shaped the evolution of FMDV serotypes by testing for accelerated fi xation rates of amino acid substitutions at particular lineages. As a second approach to the identifi cation of selection shifts between serotypes, we also tested functional divergence (within-site amino acid substitution rates changes throughout the phylogeny) between serotypes. Thirdly, amino acid sites responsible for within-serotype accelerated rates of evolution or functional divergence are identifi ed and their biological importance for the ability of the virus to spread and infect examined. Finally, we identifi ed the co-evolutionary relationships between amino acid sites under accelerated rates of evolution or functional divergence and other protein regions as to understand the molecular consequences of such changes in the VP1 protein of FMDV and to speculate about possible epidemiological scenarios responsible for such selective shifts. We perform an exhaustive analysis of how fi xation of amino acid replacements by positive Darwinian selection, coevolution between functionally/structurally important residues and functional divergence in VP1 has performed an effi cient strategy for FMDV to spread. This study complements previous reports on the evolution and emergence of foot-and-mouth disease serotypes and pinpoints the need for further experimental work on apparently non-important protein regions that extend beyond well-recognised antigenic sites.

Sequence data
A dataset comprising 665 FMDV VP1 sequences were downloaded from Genebank. This includes all serotypes and only comprises sequences with length greater than 400 nucleotides to account for most of the antigenic regions within the VP1 protein. Table 1 of Supplementary Information describes the serotypes and number of sequences used including the range of sampling times. Accession numbers of the sequences used in this study are provided in Table 2 of Supplementary Information. All serotypes are signifi cantly represented, both geographically and sample wise.

Sequence alignments and phylogenetic tree inference
Alignments were obtained for VP1 amino acid sequences for each individual serotype using the T-coffee program (Notredame, Higgins and Heringa 2000). We then aligned protein-coding nucleotide sequences for each one of the serotypes by concatenating triplets of nucleotides based on the amino acid alignment. An alignment including the 665 sequences was also built as explained above to test for functional divergence type I (e.g. within-site rates changes among serotypes, see below for a more detailed defi nition). Table 1. Amino acid sequence regions for the VP1 protein detected as having undergone positive selection in each FMDV serotype using the program SWAPSC. Regions in bold are those detected in SAT1 and SAT2. F and S refer to functionally and structurally important sites, respectively, as predicted by a neural network analysis. Functionality refers to the antigenicity of those sites. Non-synonymous-to-synonymous rates ratio (~) is shown for each region. To infer an accurate phylogenetic tree, we fi rst estimated the shape parameter (a) of the gamma distribution for the amino acid sequence alignment using the program AAml from the PAML package version 3.14 . This parameter takes into account both multiple amino acid substitutions per site and unequal substitution rates among sites. The substitution model WAG, the model that signifi cantly improves the loglikelihood value of the data after comparison with other models available in the program ModelTest (Posada and Crandall, 1998) using the Likelihood Ratio test (LRT), was used to estimate a. Finally, we inferred the phylogeny by neighbour-joining (Saitou and Nei, 1987) using gamma-corrected amino acid distances, estimated by maximum likelihood under the WAG model for the dataset of FMDV sequences, using the program MEGA v3.0 (Kumar, Tamura and Nei, 2004). To estimate the support for internal nodes, we inferred the phylogeny using gamma-corrected amino acid distances in MEGA and estimated support values by bootstrap analyses based on 1000 pseudoreplicates. The phylogenetic analyses using gamma-corrected amino acid distances and neighbour-joining yielded the same phylogenetic topology as that based on WAG distances.

Testing changes in selective constraints using neural networks
To test the biological (functional/structural) importance of particular amino acid sites in the VP1 protein we used two approaches. The fi rst approach was based on the collection of the functional information available for this protein. However, most of the amino acid sites in the molecule have not been tested for their functional importance since studies have been biased towards testing antigenic peptide regions. We hence conducted a neural-network analysis implemented in the program CONSEQ (Berezin et al. 2004). We used as input alignment each one of the FMDV subtype alignments (including each one a minimum of 30 sequences and a maximum of 203 sequences (Table 1 of supplementary information). Briefl y, CONSEQ derives automatically a phylogenetic tree from the query multiple sequence alignment and calculates the substitution rates at each position in the sequence alignment by maximum likelihood. A neural network analyses is thereafter conducted by CONSEQ to predict schemes to discriminate between buried and exposed residues. Although the sensitivity regarding the buried and exposed states has been reported to be of about 56% (Berezin et al. 2004), we tested the performance of the test by comparing the predictions to the positions of the amino acid sites in the available crystal structure for the VP1 protein of FMDV. The functional or structural importance of sites was determined by the conservation of the site, with signifi cantly exposed conserved sites being functionally important and conserved buried sites structurally important.

Characterising the main selective constraints in FMDV serotypes
To test if the different FMDV serotypes have been subjected to different evolutionary constraints at the molecular level, we conducted a precise analysis of selective constraints for each serotype. Normally, the intensity of selection on proteincoding sequences is measured by comparing the number of substitutions per synonymous site (d S ) to the number of replacements per non-synonymous site (d N ) (Kimura, 1977;Sharp, 1997;Akashi, 1999;Crandall et al. 1999;Yang and Bielawski, 2000). The ratio between the two rates (~ = d d S N ) helps to elucidate if the gene has been fi xing amino acid replacements neutrally (~ = 1), replacements have been removed by purifying selection (~ < 1), or mutations have been fi xed by adaptive evolution (~ > 1). Furthermore, since d S values are neutrally fi xed (except when constraints operate at the RNA level in these sites or when synonymous sites are saturated) they are proportional to the time since the sequences compared have diverged. Consequently, the normalisation of d N values by d S makes independent of time. It has been shown, however, that ~ is a poor indicator of the action of adaptive evolution (Sharp, 1997;Crandall et al. 1999). The rationale behind this is that punctual events of positive selection of advantageous non-synonymous mutations throughout the evolution of a particular sequence (episodic positive selection) may be hidden by an overwhelming number of deleterious evolutionary history (for example, the functional importance of sites has not changed along the evolution of the gene, and thus the rate of evolution of that site remained constant), something called the homogeneous gamma model or ratesacross-site (RAS) model (Gaucher et al. 2002). If the functional/structural importance of particular amino acid sites changes between lineages, their rates of evolution also changes and a non-homogenous gamma model will explain better the process of evolution.
A non-homogeneous gamma model of evolution can be statistically described by the functional divergence type I (Gu, 1999;Gu, 2001;Wang and Gu, 2001). Because of the signifi cant serotypic and geographic split between FMDV types, we tested whether this split between serotypes was reinforced by serotype-specifi c amino acid replacement rates. We tested this selection shifts by comparing the likelihood value of the hypothesis of functional divergence to that of a RAS model. This comparison was performed using the Likelihood-ratio test (LRT). We tested functional divergence type I between serotypes using the program DIVERGE version 1.04 (Gu, 1999), respectively. DIVERGE tests for functional divergence type I by estimating the parameter of functional divergence (i), which is equivalent to the anti-correlation in the evolution of the two group of sequences (e.g. FMDV serotypes) being compared. It estimates the posterior Bayesian probability for an amino acid site to belong to the group of sites responsible for functional divergence. These sites share the property of being highly constant in one of the sequence groups and variable in the other, supporting the gain/loss of a functional/structural constraint in one of the groups.

Identifying coevolution between structural/functional regions in VP1
Amongst FMDV antigenic sites in VP1 protein, Site A is involved in conferring the virus its ability to escape from the immune pressure and to recognise host cell receptors (Aggarwal and Barnett, 2002). Important structural/functional residues spatially adjacent to these sites may have contributed to the antigenicity of the VP1 protein by their functional or structural interaction. In fact, coevolution between sites surrounding functionally important protein regions allows maintaining the structural and spatial stability of active sites mutations removed by purifying selection from the same protein regions. Consequently, the number of amino acid substitutions will be below the expectation under neutrality despite these punctual events of adaptive evolution and the mean ~ values will be < 1. A way to tackle this problem is by estimating ~ value for specifi c branches of the tree and for each codon region of a protein. Unless branches are pre-specifi ed, using maximum likelihood for this kind of analysis is computationally prohibitive specially when the number of sequences is large and the model is complex.
We have used Kimura-based models to measure the intensity of selection between and within serotypes using the set of 665 sequences. To detect selective constraints we used a sliding-window based approach (Fares et al. 2002) implemented in the program SWAPSC version 1.0 (Fares, 2004). Briefl y, the program slides a statistically optimum window size along the sequence alignment to detect selective constraints and estimates the probability of d N and d S using a number of simulated data sets. We used 1000 simulated data sets in our analysis obtained using the program EVOLVER from the PAML (Yang, 2000) package, taking as initial parameters the average ~, transition-totransversion rates ratio and codon frequency table generated under the Goldman and Yang model (Goldman and Yang, 1994). These parameters were estimated from the real sequence alignment. The program then slides the window along the real sequence alignment and estimates d N and d S by the method of Li (Li, 1993). The signifi cance of these estimates is tested assuming a Poisson distribution of nucleotide substitutions along the alignment. This program was also used due to its ability to identify adaptive evolution (~ > 1) but also accelerated rates of amino acid substitutions without the restriction of ~ > 1 (that can be due to either adaptive Darwinian evolution or to the fi xation of slightly deleterious mutations by genetic drift), saturations of synonymous sites (where d S values are underestimated) and hot spots (where both d S and d N are signifi cantly high but where ~ < 1).

Testing Functional divergence between serotypes
Normally, in phylogenetic tree inference and detection of selective constraints, fast evolving sites and slow evolving sites are considered to remain fast and slow sites, respectively, throughout the entire (Gloor et al. 2005). Fixation of amino acid replacements in regions with strong functional/structural constraints can improve the biological fi tness of the virus (for example to escape from the immune response of the host). Alternatively, slightly deleterious mutations (mutations compromising the structural stability or functional performance of the protein) can be fi xed under genetic drift in or nearby functional/structural regions and be compensated by conditionally advantageous mutations (for example, mutations that are advantageous due to their compensatory effect) at interacting sites. The mutational dynamic of the protein (for example, fi xation of slightly deleterious mutations, advantageous mutations or compensatory mutations) yields information hence on the evolutionary pathways that a group of sequences (serotype) have undergone and provide support for the functional and/or structural importance of a specifi c molecular region. Analysis of the different types of mutations also provides an opportunity to test the heterogeneity in the mutational dynamics between serotypes.
We tested intra-molecular coevolution in each one of the serotypes by applying a method based on the information theoretic quantity called mutual information to measure the dependence of mutations in VP1 (see Korber et al. 1993 for details). This method measures the covariation between two amino acid sites using the Mutual Information Criterion (MIC). Mutual information is represented by the entropies that involves the joint probability distribution, P(s i , s' j ), of occurrence of symbol i at position s and j at position s'. The MIC values generated range between 0, indicating independent evolution, and a positive value whose magnitude depends on the amount of covariation. We only included in the analysis those variable positions in the alignment that were parsimony-informative (i.e. they contain at least two types of amino acids and at least two of them occur with a minimum frequency of two). The statistical power of the test is dependent on the number of sequences as well as the amount of variability at each amino acid site. In this case, both parameters were secured since the number of sequences was high and the variability as well. The signifi cance of the MIC values was assessed by randomisation of pairs of sites in the alignment, calculation of their MIC values and comparison of the real MIC values with the randomly generated distribution of MIC values. One million permutations were conducted to test MIC value signifi cance.

Heterogeneous patterns of adaptive evolution between serotypes
All serotypes (SAT 1, SAT 2, Asia1, A and O), except C and SAT3, presented evidence of positive Darwinian selection in the VP1 protein (Table 1). We have also analysed the functional/structural importance of these sites using a neural-network based method implemented in the program CONSEQ (Berezin et al. 2004). The assumption made in this program is that conserved regions at the amino acid level may have a functional or structural role. We did not consider the buried/ exposed state for the amino acid sites inferred by CONSEQ due to the low sensitivity (56%) reported for this method regarding these states. This analysis supported that regions detected as having undergone adaptive evolution present evidence of a functional or structural role (Table 1).
The comparison of the fi ve serotypes under positive selection highlights diverse patterns of selective constraints at the molecular and phylogenetic levels. Interestingly, all those serotypes under positive selection comprise a similar region around N131-R149 that is located in the immunodominant site A within the G-H loop. However only serotypes A and O, the wider geographically spread FMDV serotypes, have the full RGD motif under positive selection. Moreover, serotypes SAT1 and 2 present evidence of adaptive evolution at overlapping VP1 protein regions. In fact, all of the regions detected in SAT1 to be under adaptive evolution were also found in SAT2 to be positively selected (Table 1). Interestingly, SAT2 presented greater percentage of adaptive evolution compared to SAT1, which correlates with a wider geographical distribution of this serotype in Africa. Conversely, regions under positive selection in the widely geographically distributed serotypes (serotypes A, Asia 1 and O) presented poor overlap. Interestingly, there is more overlap in the regions under adaptive evolution between SAT serotypes and serotypes Asia1, O and A than within widely spread serotypes (for example between Asia1 and O).
If we were to classify these serotypes in terms of percentage of codon sites under positive selection the following is observed: SAT2 > ASIA 1 > A > SAT 1 > O. The difference from SAT 2 to type O represents a 10-fold increase in the percentage of sites under adaptive evolution.

Evidence for changes in selection constraints among FMDV serotypes
For each individual serotype we applied the neuralnetwork analysis to detect functional/structural sites implemented in the program CONSEQ. This analysis also enabled us to know whether sites under adaptive evolution had any functional/structural role. In the seven serotypes, the RGD motif from the VP1 protein was identifi ed as functionally important, except in the case of Asia1 and A serotypes where only the G144 and D145 were identifi ed as important.
The level of functional important sites varied signifi cantly between serotypes (Figure 1). For example, the variability of the proportion of functionally important sites when comparing the different serotypes was signifi cant ( 6 2 | = 14.034, P = 0.029). Similarly, the proportion of structurally important sites was also signifi cantly different among serotypes ( 6 2 | = 20.948, P < 0.01). Comparison of the number of sites under functional and structural constraints between serotypes uncovers an interesting pattern that suggests certain correlation between the selective constraints acting on VP1 and the functional/structural selection shifts among serotypes. In fact, those serotypes that have not undergone adaptive evolution present the greatest number of functionally and structurally important regions (140 and 145 sites for SAT3 and C serotypes, respectively). The percentage of sites under functional constraints as detected by CONSEQ were not an artefact of the amino acid variability produced by adaptive evolution since all positively selected regions were detected to be functionally/structurally important. Thus, high variability due to positive selection was not leading to an underestimation of functional/structural important sites.

Functional divergence between FMDV serotypes
FMDV serotypes were compared to each other and regions under functional divergence type I determined. Only amino acid sites showing posterior Bayesian probabilities greater than 0.95 were considered as having been subjected to functional divergence. In all the pairwise comparisons between serotypes, most of the sites involved in functional divergence were found in the carboxyterminal half of the molecule, affecting antigenic sites A, C and D.
Functional divergence type I between serotypes followed a trend in agreement with the geographical pattern observed in the analysis of coevolution and functional/structural sites distribution. Functional divergence affected a signifi cantly greater number of sites when the serotypes compared belonged to different-continent locations (t 19 = 2.538, P = 0.02; Table 2). Moreover, widespread viruses (O, C and A) and that in the Asian continent (Asia 1) presented lower functional divergence between each other compared to the number of sites under functional divergence when serotypes confi ned to the African continent (SAT1, 2 and 3) were compared (t 7 = 3.789, P = 0.007; Table 2). The distribution of sites under functional divergence also changed depending on the serotypes compared. Antigenic sites A, C and D were massively affected by functional divergence when African serotypes were compared ( Figure 2). Conversely, comparison

Intra-molecular co-variation generates FMDV immune-escape mutants
To determine if specifi c residues are important for the antigenicity of the virus through their functional/physical interaction with antigenic sites, we conducted an amino acid covariation analysis developed by Korber et al., (Korber et al. 1993). This method is implemented in the program PIMIC (Codoner, Elena and Fares, unpublished and available on request). Sites detected as coevolving under this approach were, in all the serotypes, located in the antigenic sites A, C, D, and antigenic site 3. There was also a strong correlation between those sites coevolving and those under positive selection. In fact, the percentage of co-evolving sites under positive selection were 100% (i.e. all sites under co-evolution were positively selected) in SAT2, 83% in A, 63% in Asia1, 33% in SAT1 and 21% in O (Figure 3). Results were not biased by the fact that positively selected sites are more prone to be detected as coevolving because in some serotypes sites under positive selection were not detected as coevolving and vice versa. Interestingly, a pattern emerges that correlates the geographic distribution of the serotype with the molecular distribution of sites under adaptive evolution or co-evolution. For example, for those serotypes widely distributed (O and A) or covering an overlapping geographical area (O, A and Asia1) several sites under adaptive evolution are found within the same group of coevolution ( Figure 4A). In addition, several of the amino acid sites detected by SWAPSC to be under accelerated rates of evolution were detected as coevolving with sites under positive selection. When adjacent sites to these accelerated amino acid residues were analysed by CONSEQ, all of them were found to be functionally important. For example, the accelerated amino acid site 3 in serotype O is coevolving with the positively selected site 4, both belonging to the coevolution group G5 ( Figure 4A). This site is in physical contact (less than 4Å distant) with functionally important regions as predicted by CONSEQ ( Figure 4B). The same holds for amino acid site 143 that is coevolving in several groups of co-evolution with sites under positive selection (139, 140, 141 and 142; Figure 4A). This site is contacting or is signifi cantly close to the RGD motif ( Figure 4B). Given their location, these sites are expected to be slightly deleterious and may have been thus compensated by conditional positively selected mutations as explained above. Each amino acid coevolve with all those from the same group. Underlined amino acids are those detected to have undergone adaptive evolution. B) The three-dimensional structure of the VP1 protein of FMDV (PDB entry 1QGC). Coevolving amino acid sites are highlighted in space-fi ll structure, whilst functional sites are defi ned by the ball-and-stick structure. Sites in black are those positively selected; whilst sites in grey represent those showing accelerated rates of evolution (where non-synonymous substitutions are signifi cantly high but where non-synonymous-to-synonymous rates ratio is below 1). Functionally important sites (Func), predicted using the program CONSEQ, are marked. The RGD tri-peptide in the antigenic site A is also shown.
Interestingly, these sites (accelerated amino acid sites and positively selected sites) did always coincide phylogenetically further demonstrating their correlated variation. These data support thus the compensatory effect of positively selected sites in the widely distributed virus serotypes because the conditions to consider mutations as compensatory (for example, they have to coevolve with amino acid sites that have undergone accelerated rates of fi xation of amino acid substitutions and these sites have to be nearby functional/structural regions) are met. Their advantageous effect however would be considered conditional to the emergence of slightly deleterious mutations and the general effect of both types of mutations is thus neutral. There was no evidence for compensatory mutations in African serotypes. In these serotypes, sites under adaptive evolution were generally included in different coevolution groups ( Figure 5A). No single site under accelerated rate of evolution (with no positive selection) was found to co-evolve with spatially close positively selected sites ( Figure 5B).
All four antigenic regions presented evidence of co-evolution between each other. Interestingly, the BC loop presented the highest percentage of co-evolving sites with antigenic sites A, C and D compared to other VP1 regions. A trend emerges that shows that serotype A has the greatest number of co-evolving sites in antigenic regions closely followed by C, then types O, SAT2, ASIA1, SAT1 and SAT3.

Discussion
Previous works have attempted to answer the question of whether the appearance of a new serotype is subjected to a change in evolutionary constraints. Elena and colleagues (1992) showed that a constant rate model of evolution in the different serotypes explains the evolution of three of the FMDV serotypes. They also demonstrated that episodic positive Darwinian selection might have been operating during the evolution of CS serotype. Since this statistically elegant study was performed, several new FMDV sequences were released and statistical and computational methods developed to conduct more detailed analyses of selective constraints. In this work we have conducted a detailed and comprehensive analysis of selective constraints in the different serotypes of FMDV using all the available sequences. The importance of FMDV and the amount of data available make this virus a good system to conduct a comparative evolution analysis of virus serotypes such as the one performed here. We would however like to state that other FMDV genomic regions might clarify or modify the conclusions shown in this work. In fact, previous works have shown that the FMDV host range can be governed by non-structural proteins such as the 3A and 3B proteins Pacheco et al. 2003). Analyses of other genomic regions may provide valuable information to confi rm many of the conclusions in this work.

Heterogeneous selective constraints and the emergence of immuneescape mutants in FMDV serotypes
Adaptive evolution in FMDV has been previously detected in natural isolates and has been associated with the emergence of immune-escape and persistent mutants Tosh et al. 2003;Mittal et al. 2005). Because the method used in this study examines specifi c branches of the tree and regions of the alignment for adaptive evolution, additional sites to those observed previously have been detected under positive selection in FMDV type A. Splitting the evolutionary time hence in as many temporal points as branches in the FMDV tree makes the non-synonymous-to-synonymous rates ratio a more sensitive approach to detect episodic changes on selective constraints. Our results are also consistent with those of Haydon et al. ) as we do not fi nd any signifi cant level of positive selection in types C and SAT3 but do detect a considerable amount in type O. Interestingly, C and SAT 3 serotypes have almost gone to extinction to this date. As a limitation of the method used here to detect selective constraints is the fact that, even though the SWAPSC software is applicable to RNA data, selective constraints operating at synonymous sites to maintain a stable secondary RNA structure may infl ate ~ values due to the purifying selection acting on the RNA molecule. However, this would affect all the regions in the protein and thus selective constraints would not be concentrated in few biologically important protein regions.
On examination of the sites under adaptive evolution we fi nd that G-H loop that ranges from amino acids 131-149 have undergone positive selection in all the serotypes. Within SAT serotypes we fi nd Site 3 and antigenic region C under positive selection. Most of the regions under positive selection in SAT serotypes are not found in non-African serotypes, sparking speculation that there is a difference in the selective pressures governing the SAT types compared to non-African serotypes. On the other hand, the difference in the percentage of sites that have undergone positive selection is very noticeable, with SAT2 having a signifi cantly greater percentage of sites undergoing selection compared to SAT1. This coincides with the fact that SAT2 is widely distributed in the African continent whereas SAT1 is confi ned to specifi c African regions, yet both serotypes overlap in some African territories. A plausible explanation for this correlation is that SAT2 has accumulated amino acid replacements in its VP1 protein that permitted the generation of  immune-escape mutants and the emergence of variants in SAT1-SAT2 overlapping geographical areas as to minimize the competition by utilizing different niches in these areas (adaptive radiation). If this hypothesis were true, we would expect greater accelerated rates of evolution in the VP1 protein from serotypes in overlapping geographical areas compared to the same serotypes in nonoverlapping areas. Examination of the Poissoncorrected amino acid distances between serotypes in overlapping regions reveals faster evolutionary rates for data sets of sequences in overlapping geographical regions for all the serotypes ( Figure 6). This fact is independent of different sequence sampling dates because all the sets included sequences from similar sampling dates and were taken from the same countries. Although the analyses here can be linked to the ability of SAT1 and SAT2 to infect other hosts in addition to establish persistent infections in the buffaloes, further analyses are required to confi rm the direct relationships between host range and pathogenesis and the functional divergence between both serotypes. For instance, analysing genes responsible for the host range and pathogenicity such as 3A and 3B might shed more light to this problem.
Host-cell receptor recognition and antigenicity may be linked by positive selection and coevolution Amino acid sites detected to be under adaptive evolution in FMDV serotypes are mostly located in antigenic regions known to interact with both, integrin receptors (Jackson et al. 1996;Jackson et al. 2000;Jackson et al. 2002) and neutralising antibodies (Verdaguer et al. 1995;Ochoa et al. 2000). It is remarkable the detection of positive selection in the RGD motif despite its direct involvement in the interaction with the host cell receptors. Experimental dispensability of the RGD tri-peptide for cell entry expands greatly the repertoire of antigenic variants of FMDV with substitutions at the antigenic site A (Martinez et al. 1997;Ruiz-Jarabo et al. 1999). FMDV exhibits an astonishing fl exibility in receptor usage, as substitutions within the RGD or at neighbouring sites have been isolated in viruses from lesions in immune cattle challenged with a highly virulent FMDV (Taboga et al. 1997). The co-evolution of positively selected sites in antigenic regions, as shown in this work, also involved in the interaction with alternative cell surface receptors complements this conclusion. This co-evolution can be promoted by the position of the mobile loop ensuring interaction of residues in the G-H loop with residues located in distant regions of the VP1 protein such as the B-C loop . We also concur with Aggarwal and Barnett (Aggarwal and Barnett, 2002) who suggested that unidentifi ed sites outside of antigenic regions may also be important in the immune response. In this respect, our work shows non-antigenic regions under positive selection co-evolving with known antigenic sites.
A geographical-specifi c pattern of adaptive coevolution in FMDV Examination of the three-dimensional localisation of coevolving sites suggests fi xation of mutations with compensatory effects in VP1 from serotypes with a wide geographical distribution. In fact, in these serotypes we identifi ed fast-evolving sites co-evolving with positively selected sites. These accelerated sites are in contact with functionally important sites as highlighted in the threedimensional structures. Mutations at these sites may have important negative effects on the functional regions that must be compensated by advantageous mutations at adjacent sites. Both types of mutations together would have neutral effects on the protein's function. In addition, for those accelerated sites affecting important regions embedded in complex local structures, such as the RGD motif, several advantageous mutations at nearby sites would be needed to compensate for the deleterious effect of accelerated sites. This would explain the signifi cant number of positively selected sites belonging to the same co-evolution group in the case of these serotypes. Conversely, those viruses with a distribution confi ned to the African continent showed no indication of compensatory mutations. This pattern of coevolution may have probably been produced by fi xation of slightly deleterious mutations in the widely distributed serotypes (for example, serotypes O and A) subjected to population bottlenecks due to the high host geographical mobility. Caution is however stressed in interpreting these results since high host population densities may ameliorate the effects of genetic drift caused by host mobility. Conversely, SAT serotypes are confi ned to specifi c areas of the African continent probably infecting persistently the host (for example, they have their reservoir in the African buffalo) through improving their ability to generate highly competitor immuneescape mutants. In addition, compensatory mutations and the co-evolution among antigenic sites, as demonstrated in this work, would permit fi xing immune-escape mutations probably maintaining the ability to recognise host-cell receptors in these serotypes. This is in agreement with previous studies suggesting the co-evolution of antigenicity and receptor usage (Mateu et al. 1996;Domingo et al. 1999;Baranowski et al. 2001). Furthermore, the fact that mutations within the B-C loop can infl uence the conformational stability of the G-H loop (Parry et al. 1990) supports the importance of the co-evolving sites in both loops detected here.
A geographical-based pattern of functional divergence between FMDV serotypes Our analysis supports that FMDV serotypes geographically distant have diverged functionally/ structurally during their independent evolution. In contrast to the African serotypes that show a geographical confi nement, serotypes in the Asian continent present evidence for inter-serotype contact due to their lower functional divergence. This has been probably produced due to the greater mobility of the virus or the host that could eventually lead to bottlenecks on the viral population.

Virus variability and survival
In this manuscript we give details about the complex dynamics of evolution of an economically important virus. Variability in RNA viruses caused by high errors rates during virus replication seems to be important for the survival of virus populations under changing environmental conditions. It has been recently shown that low fi delity increases virus pathogenesis and that selective pressures are operating at the population level (Pfeiffer and Kirkegaard 2005;Vignuzzi et al. 2006). The main conclusion of these studies is that pathogenesis is supported by Quasispecies diversity. High fi xation of rates of amino acid replacements in FMDV can be hence instrumental in the survival of the viral population per se.

Sampling bias
Sampling limitations may indeed play a factor in the interpretation of these results. In order to provide accurate refl ections of the disease, there is need for complete temporal, spatial coverage and full recording of all disease events. This study attempts to maximise the geographic distribution and range of sampling times for sequences in order to obtain a random and true representation of the worldwide situation rather than single episodes or epizootics. Indeed it is inevitable that the current sequences sampled here are only a subset of those actually in nature but nevertheless they aid in our understanding of the disease. In any case, this is to our knowledge the most comprehensive study done on the evolutionary constraints on FMDV serotypes.